AITopics | hybrid model

Collaborating Authors

hybrid model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Neural Information Processing SystemsMar-21-2026, 03:14:09 GMT

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of converting these pretrained models for deployment. We demonstrate that it is feasible to distill large Transformers into linear RNNs by reusing the linear projection weights from attention layers with academic GPU resources. The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks. Moreover, we introduce a hardware-aware speculative decoding algorithm that accelerates the inference speed of Mamba and hybrid models. Overall we show how, with limited computation resources, we can remove many of the original attention layers and generate from the resulting model more efficiently. Our top-performing model, distilled from Llama3-8B-Instruct, achieves a 29.61 length-controlled win rate on AlpacaEval 2 against GPT-4 and 7.35 on MT-Bench, surpassing the best 8B scale instruction-tuned linear RNN model. We also find that the distilled model has natural length extrapolation, showing almost perfect accuracy in the needle-in-a-haystack test at 20x the distillation length. Code and pre-trained checkpoints are open-sourced at MambaInLlama for distillation and SpeculativeMamba for speculative decoding.

large language model, machine learning, natural language, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Automatically Learning Hybrid Digital Twins of Dynamical Systems Samuel Holt

Neural Information Processing SystemsFeb-16-2026, 07:25:36 GMT

However, existing approaches to DTs often struggle to generalize to unseen conditions in data-scarce settings, a crucial requirement for such models. To address these limitations, our work begins by establishing the essential desiderata for effective DTs.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.13)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)
Workflow (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Epidemiology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Combining Generative and Discriminative Models for Hybrid Inference

Victor Garcia Satorras, Zeynep Akata, Max Welling

Neural Information Processing SystemsFeb-13-2026, 10:03:49 GMT

Neural Information Processing Systems http://nips.cc/

graphical model, inference, neural network, (13 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > Michigan (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Hybrid Models for Learning to Branch

Neural Information Processing SystemsFeb-10-2026, 12:31:11 GMT

In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive?

machine learning, natural language, node, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Add feedback

1c104b9c0accfca52ef21728eaf01453-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 16:46:01 GMT

learning, library, program induction, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

Sparse Modular Activation for Efficient Sequence Modeling

Neural Information Processing SystemsDec-24-2025, 20:14:54 GMT

Recent hybrid models combining Linear State Space Models (SSMs) with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. However, current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. To address this limitation, we introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely and dynamically activate sub-modules for sequence elements in a differentiable manner. Through allowing each element to skip non-activated sub-modules, SMA reduces computation and memory consumption of neural networks at both training and inference stages. To validate the effectiveness of SMA on sequence modeling, we design a novel neural architecture, SeqBoat, which employs SMA to sparsely activate a Gated Attention Unit (GAU) based on the state representations learned from an SSM. By constraining the GAU to only conduct local attention on the activated inputs, SeqBoat can achieve linear inference complexity with theoretically infinite attention span, and provide substantially better quality-efficiency trade-off than the chunking-based models. With experiments on a wide range of tasks, including long sequence modeling, speech classification and language modeling, SeqBoat brings new state-of-the-art results among hybrid models with linear complexity, and reveals the amount of attention needed for each task through the learned sparse activation patterns. Our code is publicly available at https://github.com/renll/SeqBoat.

efficient sequence modeling, name change, sparse modular activation, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)

Add feedback

Hybrid Models for Learning to Branch

Neural Information Processing SystemsDec-24-2025, 16:21:59 GMT

A recent Graph Neural Network (GNN) approach for learning to branch has been shown to successfully reduce the running time of branch-and-bound algorithms for Mixed Integer Linear Programming (MILP). While the GNN relies on a GPU for inference, MILP solvers are purely CPU-based. This severely limits its application as many practitioners may not have access to high-end GPUs. In this work, we ask two key questions. First, in a more realistic setting where only a CPU is available, is the GNN model still competitive? Second, can we devise an alternate computationally inexpensive model that retains the predictive power of the GNN architecture? We answer the first question in the negative, and address the second question by proposing a new hybrid architecture for efficient branching on CPU machines. The proposed architecture combines the expressive power of GNNs with computationally inexpensive multi-layer perceptrons (MLP) for branching. We evaluate our methods on four classes of MILP problems, and show that they lead to up to 26% reduction in solver running time compared to state-of-the-art methods without a GPU, while extrapolating to harder problems than it was trained on.

Add feedback

Real-time Air Pollution prediction model based on Spatiotemporal Big data

Le, Van-Duc, Bui, Tien-Cuong, Cha, Sang Kyun

arXiv.org Artificial IntelligenceDec-9-2025

Air pollution is one of the most concerns for urban areas. Many countries have constructed monitoring stations to hourly collect pollution values. Recently, there is a research in Daegu city, Korea for real-time air quality monitoring via sensors installed on taxis running across the whole city. The collected data is huge (1-second interval) and in both Spatial and Temporal format. In this paper, based on this spatiotemporal Big data, we propose a real-time air pollution prediction model based on Convolutional Neural Network (CNN) algorithm for image-like Spatial distribution of air pollution. Regarding to Temporal information in the data, we introduce a combination of a Long Short-Term Memory (LSTM) unit for time series data and a Neural Network model for other air pollution impact factors such as weather conditions to build a hybrid prediction model. This model is simple in architecture but still brings good prediction ability.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1805.00432

Country: Asia > South Korea > Daegu > Daegu (0.29)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting

Amor, Souhir Ben, Ziel, Florian

arXiv.org Machine LearningDec-5-2025

We present a novel recurrent neural network architecture designed explicitly for day-ahead electricity price forecasting, aimed at improving short-term decision-making and operational management in energy systems. Our combined forecasting model embeds linear structures, such as expert models and Kalman filters, into recurrent networks, enabling efficient computation and enhanced interpretability. The design leverages the strengths of both linear and non-linear model structures, allowing it to capture all relevant stylised price characteristics in power markets, including calendar and autoregressive effects, as well as influences from load, renewable energy, and related fuel and carbon markets. For empirical testing, we use hourly data from the largest European electricity market spanning 2018 to 2025 in a comprehensive forecasting study, comparing our model against state-of-the-art approaches, particularly high-dimensional linear and neural network models. The proposed model achieves approximately 12% higher accuracy than leading benchmarks. We evaluate the contributions of the interpretable model components and conclude on the impact of combining linear and non-linear structures.

electricity price, forecasting, price forecasting, (16 more...)

arXiv.org Machine Learning

2512.0469

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Europe > Germany (0.04)
Asia > China (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.93)
Energy > Renewable > Wind (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

King, Juan C., Amigo, Jose M.

arXiv.org Artificial IntelligenceDec-3-2025

The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and applying advanced techniques of Machine Learning and Deep Learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates Long Short-Term Memory (LSTM) networks with algorithms based on decision trees, such as Random Forest and Gradient Boosting. While the former analyze price patterns of financial assets, the latter are fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, Random Forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables.

artificial intelligence, hybrid model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/forecast7030049

2512.02036

Country:

Europe (0.46)
Asia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Financial News (0.95)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback